Schemata as Building Blocks: Does Size Matter? 



C. R. Stephens* and H. Waelbroeckt 

Instituto de Ciencias Nucleares, UNAM 
Circuito Exterior, A. Postal 70-543 
Mexico D.F. 04510 
R. Aguirre^ 
DEPFI, UNAM, A. Postal 70-543 
Mexico D.F. 04510 



Abstract 



We analyze the schema theorem and the building block hypothesis using a recently 
derived, exact schemata evolution equation. We derive a new schema theorem 
based on the concept of effective fitness showing that schemata of higher than av- 
erage effective fitness receive an exponentially increasing number of trials over time. 
The building block hypothesis is a natural consequence in that the equation shows 
how fit schemata are constructed from fit sub-schemata. However, we show that 
generically there is no preference for short, low-order schemata. In the case where 
schema reconstruction is favoured over schema destruction large schemata tend to 
be favoured. As a corollary of the evolution equation we prove Geiringer's theorem. 
We give supporting numerical evidence for our claims in both non-epsitatic and 
epistatic landscapes. 



1 Introduction 

A very large proportion of scientific endeavour has been associated with the question: What 
are "things" made of? the reason being that this is an indispensable requirement for under- 
standing how and why a "thing" functions. The answer has always tended to be: "things" 
are made of other, more elementary "things" . In the physical sciences this is obvious: a 

* e-mail: stephens@nuclecu.unam.mx 
' e-mail: hwael@nuclecu.unam.mx 

* e-mail: rosalia@nuclecu.unam.mx 



table is made up of atoms, which in turn are made up of electrons and a nucleus, which in its 
turn ... In biology a living organism (generally) is composed of various organs and tissues, 
which in their turn are made of various types of cells, which in their turn are composed 
of various constituents such as nuclei, protoplasm, mitochondria etc. From the nucleus we 
pass to chromosomes, genes, DNA and RNA etc. The latter, along with other important 
ingredients, forming an elaborate "computer programme" for the construction of the organ- 
ism. In computer science high level languages are composed of more elementary languages 
until we arrive at the most basic machine level language recognized by the computer itself. 

What have all these examples in common? They show that all things are made out of 
"building blocks" , whether they be tables, giraffes or computer programmes. Inevitably 
there exists a hierarchy of building blocks, the hierarchy being ordered more often than not 
according to scale and complexity. One can think of building blocks as the "effective degrees 
of freedom" (EDOF) of a system, which in their turn are composed of more "fine grained" , 
elementary degrees of freedom. For complex systems the former are composed of very large 
numbers of the latter. What building blocks one uses to describe a system depends very 
much on what one wants to say about it. In particular, on how fine grained a description 
one requires. Almost always a more coarse grained description will suffice. 

So what has the above to do with genetic algorithms (GAs)? At the most basic level all 
the above can be coded as bit strings (of course) and in some way or other be associated 
with the notion of adaptation and optimization of some "fitness" function in a complicated 
environment. Trying to understand these problems in adaptation and optimization at the 
level of the fundamental, microscopic degrees of freedom is prohibitively difficult: there are 
simply too many and, more often than not, they interact in a highly non-linear fashion. 
In order to describe these systems both qualitatively and quantitatively one needs to know 
how EDOF emerge. In particular, in the context of GAs, if we coded the above problems as 
such how would some of the known EDOF emerge? In GAs one can universally represent 
EDOF as schemata. Of course, not all are of equal utility. EDOF if they are to be useful in 
the description of a system must display a certain degree of integrity. 

The question is: what schemata are utilized by a GA? Or rather, what are the typical 
properties possessed by successful schemata. Naturally, the answer to this question will 
depend on the fitness landscape of interest. However, one might enquire as to if or not 
such properties exist in generic classes of landscapes. In fact theory has tried to be even 
more ambitious. The schema theorem and the related building block hypothesis ||, 
propose that the EDOF in GAs are fit, short schemata irrespective of the landscape! This 
is an extremely strong statement. The fact that fit schemata are preferred is intuitively 
understandable, though we will see some counterexamples to this later, whilst the purported 
preference for short schemata is a supposed consequence of the destructive effect of crossover. 

In this paper we will investigate theoretically and experimentally the evolution of schemata 
and in particular how this evolution depends on the defining length of the schemata. Our 
theoretical analysis will be based on an exact evolution equation jo], [l^, [l| for schemata 
for the case of proportional selection and 1-point crossover. In section 2 we will give a 
brief overview of the equation. In section 3 we will discuss some of its more important 
theoretical ramifications and in section 4 we will show how numerical experiments confirm 
the theoretical predictions. Finally in section 5 we will draw some conclusions. 



2 Schema Equation 



In this section we give without proof (see jyj, |L2| for more details) the schema evolution 
equation for the relative proportion, P(£, t) = n(£, t)/n, of the schema £ of defining length Z 
and order N2 at time t in a canonical GA of population n consisting of chromosomes of N 
bits evolving with respect to proportional selection, point mutation and 1-point crossover. 
In the limit n — > 00 P(£, t) gives the probability of finding the schema £ at time t. Explicitly 

p(£, t + 1) = v(Qp c (£, *) + x; - o^tf i, *) (i) 

where the sum is over all schemata, £i, that differ by at least one bit in one of the defining 
bits of the schema £, and the effective mutation coefficients V(£) and V(fci — ► £) represent 
the probabilities that the schema £ remains unmutated and the probability that the schema 
(Sfi mutates to the schema £ respectively. P c (£, t) is the mean proportion of schemata £ at 
time t after selection and crossover. Explicitly 

l-l 

Pc(Z, t) = P'(£, f) - jfi^r X (P'(£, t) - P'Uk),t)P'(Ul ~ *)) (2) 

k=l 

where P'(£, i) = (/(£, t)/ f(t))P(£, t), /(£, i) being the mean fitness of the schema £ and /(£) 
the average population fitness. p c is the crossover probability and k the crossover point. The 
quantities P'(£ L ,i) and P'(£ H , are defined analogously to P'(£, t) but refer to the schemata 
£l and £^ which are the parts of £ to the left and right of k respectively. One can illustrate 
the content of the equation with a simple diagramatic example: ****1**|0**1***** 
is a schema with I = 7 and = 3. The crossover point is at k = 7 hence £l has N2 = 1 
and I = 1 while has N2 = 2 and Z = 4. Note that the equation takes into account 
exactly both the effects of schema destruction and schema reconstruction. At the level of 
strings the equation will be equivalent to other exact formulations ^, [ij], pi Ei, 15, lfi] , and 



is most closely related to the analogous equation for a canonical GA evolving with respect 
to proportional selection and recombination derived by Altenberg |jj based on earlier work 
in genetics by Karlin and Liberman 

There are several notable features of the above equation: first of all it implies that crossover 
as an operator imposes the idea of a schema. This can easily be seen by considering the 
above equation for the case where £ is the entire string, c, . The reconstruction probability 
depends on the relative fitness of strings that contain the constituent elements, cf and cf 
of Cj, but given that there can be many strings that contain cf or cf one must take an 
average over these strings. In this sense we are averaging over the "degrees of freedom" 
represented by the bits that are not contained in cf or cf 1 . The equation also shows that 
the effects of reconstruction will outweigh destruction if the parts of a string are more 
selected than the whole. This is closely related to the notion of linkage disequilibrium from 
population genetics. However, linkage disequilibrium there is measured by the covariance, 
C(£l,£r) = P($>t) — P{^L,i)P(^R,t), of the relative frequencies of £l and Here, we 
see that the relevant measure of schemata growth is the covariance, C'(£l, £r) = P'(£, t) — 
P'{^L,t)P'(£,R,t), of the fitness-weighted relative frequencies of £l and £r. Clearly C does 
not have to be of the same sign as C. 

The equation also shows an hierarchy in structure both in complexity and size and also 
in time. This is obvious in the very nature of the equation which relates a schema £ to 



its "building blocks" of lower order and smaller denning length, £/, and £r. These in turn 
are related to even smaller, lower order building blocks £,lr, £,rl and £,rr which are 
associated with reconstruction of the schemata £z, and ^r. The hierarchy terminates at 
1-schemata, i.e. schemata with N2 — 1, which are immune to the effects of crossover. The 
hierarchical nature of the evolution in terms of complexity and size is manifest in (|2|) as 
schemata of a certain "size" ' (defining length and order) are related to schemata of smaller 
size, which in their turn are related to yet smaller schemata etc. The structure is also 
temporal as schemata at time t are related to smaller schemata at time t — 1, which in turn 
are related to yet smaller schemata at time t — 2 etc. 

Note the "form invariance" of the equation under a coarse graining. What does this mean? 
Consider the equation at the level of a complete string. As we have pointed out, the very 
notion of recombination introduces a coarse graining in that we can write the recombination 
contribution in terms of P(cf, t) and P(c^, f), which involve summing over the microscopic 
(bit) degrees of freedom of {c, — cf} and {ct — cf } respectively, where { — } denotes set 
difference. Coarse graining here simply means that we have forfeited detailed knowledge 
about the microscopic degrees of freedom not contained in cf or cf. The corresponding 
evolution equations for cf and cf involve recombination terms where a further coarse 
graining must be carried out via a summation, for example, over the microscopic degrees of 
freedom of {c'i — cf L } in the case of the schema cf L . However, irrespective of the degree 
of coarse graining the form of the evolution equation remains exactly the same. 

We see then that a general form of the building block hypothesis is inherent in the very struc- 
ture of the evolution equation for a canonical GA. Recombination builds complex schemata 
from more primitive consituents which in turn are constructed from yet more elementary 
building blocks until we arrive at the ultimate building blocks — f-schemata. The question 
of whether a GA utilizes building blocks to find a good solution can be seen to be related 
to whether schema reconstruction or destruction is the most important effect. However, 
this is clearly not the only criterion. We have said that the evolution equation contains 
a generalized form of the block hypothesis: simply that larger, more complex schemata 
are constructed from more primitive building blocks irrespective of whether they are fit 
schemata utilized by the GA in finding fit chromosomes. This is not however the only way 
to grow a schemata. Perhaps the schemata was in the initial population and was of high 
fitness. To see whether indeed a GA uses fit, short building blocks as the standard building 
block hypothesis and Schema theorem purport we must examine more closely the idea of 
fitness. 

3 What do we mean by fit? 

Why are certain schemata preferred over others? Because they are fitter of course. But 
what does one really mean by this statement? Consider the following contrived but instruc- 
tive example: consider a 2-schemata, i.e. iVjj = 2, problem with crossover but neglecting 
mutation, and with a fitness landscape where /(01) = /(10) = and /(ll) = /(00) = 1. 
The steady state solution of the schema evolution equation is 

P(ll) = P(0O) = i(l-fl^i) P(01)=P(10) = ||^ (3) 

For I — N and p c = 1 we see that half the steady state population is composed of strings 
that have zero fitness! Note that this fixed point is in fact a stable one. Such results lead 



one to doubt whether the concept of fitness is the most relevant one in gauging the growth 
of a schema. 

Another, more relevant, example is associated with a GA of binary alleles with mutation 
and proportional selection but without crossover (the Eigen model || ) . Consider a "needle- 
in-a- haystack" type fitness landscape where there exists one string, c m — the "master 
sequence" , of high fitness all the rest being of equal low fitness. When the mutation rate, 
fi, is zero then the steady state population, assuming n is large, is such that P(c m ) — 
1. When fi > but small, P(c m ,t — > oo) ^ 1. However, the population is clustered 
around the master sequence in that the Hamming distance between the master sequence 
and the large majority of other strings in the steady state population is small. Increasing 
\x one reaches a critical value, // c , beyond which P(c m ,t — > oo) — > 1/2 N , which is exactly 
the proportion expected in a completely random population. The sharp phase transition 
at n c is familiar from thermodynamics being due to the competition between "energy" 
(selection) and "entropy" (mutation). In fact, the quantity — (1/2) ln(/i/(l — fi)) is the 
precise analog of the thermodynamic temperature. For fi > /j, c the evolution of the GA 
cannot be well understood by thinking of evolution on the needle-in-a-haystack landscape. 
Thermodynamically this corresponds to considering the energy landscape as opposed to the 
more physically relevant free energy landscape. In the entropy dominated regime for /i > fi c 
every state has the same free energy hence there is effectively no selection acting. Once 
again we are led to call into question the usefulness of the standard notion of fitness. 

As a third simple example consider the effect of mutation without crossover in the context 
of a model that consists of 2-schemata, 11, 01, 10, 00, where each schema can mutate to 
the two adjacent ones when the states 11, 10, 00, 01 are placed clockwise on a circle. For 
example, 11 can mutate to 10 or 01 but not to 00. We assume a simple degenerate fitness 
landscape: /(ll) = /(01) = /(10) = 2, /(00) = 1. Clearly there is no selective advantage 
for any one of the three degenerate schemata over the others. In a random population, 
P(ll) = ... = P(00) = i. If there is uniform probability, for each schema to mutate to 
an adjacent one then the evolution equation that describes this system is 

P(i, t + 1) = (1 - 2fx)P'(i, t) + fi(P'(i - 1, t) + P'(i + 1, t)) (4) 

For fj, = the steady state population is P(ll) = P(01) = P(10) = 1/3, P(00) = 0. Thus we 
see the synonym symmetry of the landscape associated with the degeneracy of the states 11, 
10 and 01 is unbroken. However, for fi > 0, the schema distribution at t — 1 starting from a 
random distribution at t = is P(ll) = 2/7, P(01) = P(10) = (2-fi)/7, P(00) = (l+2/x)/7. 
Thus, we see that there is an induced breaking of the landscape synonym symmetry due to 
the effects of mutation. In other words the population is induced to flow along what in the 
fitness landscape is a flat direction. 

Why is it that the examples above lead us to reconsider the idea of fitness? Fitness is 
intrinsically associated in the standard picture with a particular genetic operator — selec- 
tion. In the above we have two examples that exhibit regimes where other genetic operators, 
crossover and mutation respectively, can dominate and another example wherein populations 
are forced to flow along directions with zero selection gradient. In such regimes intuition 
gleaned from the normal fitness landscape is of little value. Clearly what is required is a 
generalization of fitness, a type of "effective fitness" , that treats the various genetic oper- 
ators on a more democratic footing and where population flows take place in an "effective 
fitness" landscape. In the case of selection and mutation the problem can be reformulated 



into a problem in equilibrium thermodynamics pC[ hence the standard thermodynamic free 
energy may be utilized. In more general circumstances one must find a more general effective 
fitness. 

A natural candidate for an effective fitness has been given in jD], |l2|, Specifically, 

Pfot+l) = £#^Ptt,t) (5) 
Comparing with equation (Q) one finds 

u& t) = no (^1) m t)+j2 - o (§§f ) /(*) ( 6 ) 

Si 

This definition is very natural from an evolutionary viewpoint as it gives a direct measure 
of the reproductive success of a given schema. In the limit fi — » 0, p c — > one finds that 

Using the concept of effective fitness one can much better understand the three examples 
given earlier. For instance, in the first example, although the schemata 01 and 10 have 
zero fitness their effective fitness is non-zero. Similarly, in the third example although the 
schemata 11, 10 and 01 are degenerate in terms of fitness this degeneracy is lifted by the 
effect of mutation and this is manifest at the level of the effective fitness. The above also 
leads to the idea of an effective selection coefficient, s off = t)/f(t) — 1, that measures 

directly selective pressure including the effects of genetic operators other than selection. If 
we think of s e{{ as being approximately constant in the vicinity of time t , then s eff (to) gives 
us the exponential rate of increase or decrease of growth of the schema £ at time to. In the 
limit of a continuous time evolution the solution of (|5|) is 

P(£,t) =P(£,0)ei"o s «* d *' (7) 

Using the evolution equation and the concept of effective fitness one can formulate a new 
Schema theorem that unlike the standard schema theorem is an equality rather than an 
inequality 



Schema Theorem 



P{u+1) = f^Ap iu) (8) 



The interpretation of this equation is clear and analogous to the old schema theorem: 
schemata of higher than average effective fitness will be allocated an "exponentially" in- 
creasing number of trials over time. We put the word exponentially in quotes as the real 
exponent, J s c i f dt' , is not, except for very simple cases such as a flat fitness landscape, of 
the form at, where a is a constant. The above says much more than the standard schema 
theorem: first of all it is an exact equation not just a lower bound; secondly it gives a deeper 
insight into the role of crossover. This comes about because the equation takes into account 
schema creation. The standard schema theorem emphasizes only the destructive effect of 
crossover. 



Armed with the above we can much more readily investigate the reconstructive aspect of 
crossover. In fact we will see generically that it is a more important effect. This is of 
great relevance in the link between the schema theorem and the building block hypothesis. 
The former gives a quantitative bite to the latter by showing that the destructive effects 
of crossover are greater the longer the defining length of the schema; thus leading to the 
notion that short, fit schemata are favoured. We can readily see that this is not generally 
true. If schema reconstruction outweighs that of destruction then crossover is more positive 
the longer the schema. Under such circumstances long, fit schemata are favoured. We will 
see this confirmed experimentally in the next section. Finally, we may remark that unless 
the fitness landscape warrants it there is no reason to think of a building block as a "local" 
object, as seems to be the case in the work on Royal Road functions j| where attempts 
were made to validate the building block hypothesis by giving high fitness to a very small 
number of states associated with localized blocks of Is. 

Another natural definition of effective fitness which leads us to a very simple proof of 
Geiringer's theorem |^| follows from splitting the evolution equation into those terms that 
are linear in P(£, t) and those that are independent of it. For instance, in the case of selection 
and crossover we have 

p&t+i) = &&$-p(z,t)+m (9) 

where = (l- Pc «)S|*i and j(t) = fa P'&, t)P'(C R , t). The corre- 

sponding effective selection coefficient is s' ff = ((1 — p c ) /Iff — !)■ ^ n the um it °f a 
continuous time evolution (j^ may be formally integrated to yield 

P(£,i) =efo s '^ {t "> dt 'p(£,0)+ e -l'o s ^ it ' )dt ' f j(t')e~ 51 s '° s{t " )dt " dt' (10) 

Jo 

In a flat landscape s' ff = — p c (l — 1)/(N — 1), hence 

P(£,t) = e-^W^tp&O) + ^J^-e^S 4 x 

v f p'i^p'i^t'y'^'dt' (ii) 
k=l J ° 

Notice that dependence on the initial condition, P(£,0), is exponentially damped unless £ 
happens to be a 1-schema, the solution of the 1-schemata equation being P{i,t) = P(i,0). 
An immediate consequence is that when considering the source term describing reconstruc- 
tion the only non-zero terms that need to be taken into account are those which arise from 
1-schemata, as any higher order term will always have an accompanying exponential damp- 
ing factor. Thus, we see that the fixed point distribution for a GA with crossover evolving 
in a flat fitness landscape is 

N 2 

P*(O=Lt t ^ QC P(Z,t) = Y[P(t,0) (12) 



which is basically Geiringer's Theorem in the context of schema distributions and simple 
crossover. We see here that the theorem appears in an extremely simple way as a consequence 



of the solution of the evolution equation. Note that the fixed point distribution arises purely 
due to the effects of recombination, the long time behaviour depending only on the initial 
distribution of the most elementary building blocks — the 1-schemata. 

Geiringer's theorem will also hold in a non-flat landscape if selection is only very weak, 
where what we mean by weak is that ~ (1 + e ) and e < (p c ((l - 1)/(N - 1))/1 - 

p c {(l — — 1))) V/ > 1. In this case anything other than a 1-schema will once again 

be exponentially damped. In this case however due to the non-trivial landscape certain 
1-schemata are preferred over others. A concrete example of such a landscape is ft = 1 + oti 
where Yli \ a i\ ^ ( e /(2 + e)) an d ft is the fitness of the ith bit. Note there is no need to 
restrict to a linear fitness function here, arbitrary epistasis is allowed as long as it does not 
lead to large fitness deviations away from the mean. In this case © < 1 + e. 

So what can we glean from our evolution equation in terms of schema size vis a vis the 
building block hypothesis? For a flat fitness landscape the effective selection coefficient for 
a 2-schema ij is 

= ^ {— ) +Pc \—i ) pjjj) (13) 

One sees that schema reconstruction will exceed that of schema destruction if i and j are 
negatively correlated, i.e. if the linkage disequilibrium is negative. When will this be the 
case? The fitness landscape itself can quite easily induce negative correlations between 
£l and £r. In this case there is a competition between the (anti-) correlating effect of 
the landscape and the mixing effect of crossover. For instance, in the neutral case of a 
Kaufmann k = landscape when Sf^ L ,6f^ R > 0, where 8f% L = f(£,t) — f(t), then 1 + 
^j^-Sf^ < (1 + ^jf^Sf^ L )(l + ^ftSf£ R ), so selection induces an anti-correlation, hence in an 
uncorrelated initial population, P'(£, t) < P'(^L,t)P'(^R, t). In this case there will clearly be 
a preference for large schemata rather than small ones. In the case of deceptive problems, 
for instance the minimal deceptive problem |^|, typically there is a positive correlation. 
Under such circumstances schema destruction will be the dominant effect and one should 
see a preference for short, fit schemata as hypothesized by the canonical Schema theorem 
and Block Hypothesis. In a fitness landscape that has both a deceptive and a non-deceptive 
component a given schema may typically be reached via deceptive or non-deceptive channels. 
For instance, for a 3-schema, ijk, perhaps the channel ij + k — > ijk is deceptive whereas 
the channel i + jk — > ijk is not. If both channels are equally likely in terms of selection 
then crossover will favour the non-deceptive channel and this will be manifest in the fact 
that one channel contributes positively to the effective fitness whereas the other contributes 
negatively. It has been hypothesized [ [l3"| that generically populations will evolve via non- 
deceptive channels. Further theoretical results for k = 0, 2 can be found in O, n3|. 



4 Experimental Results 

In this section we present experimental evidence for many of the statements and theoretical 
results we have presented. Given our desire to investigate the effects of crossover vis a 
vis its effect on schema length we considered a GA with fi = as mutation, being a local 
operator, can have no direct effect on schema length. We will first consider the case of a non- 
epistatic landscape — the well known counting ones, or unitation problem. We considered 
a population of 5000 8-bit strings, thus the maximum schema length is 8. We chose a large 



population so as to be able to ignore finite size effects. Figures f and 2 are plots of M(l) 
versus time where M{1) = (n opt (l) — n opt (8))/n opt (8). Here, n opt (l) is the number of optimal 
2-schemata of defining length I normalized by the total number of length I 2-schemata per 
string, i.e. 9 — 1. By optimal 2-schemata we mean schemata containing the global optimum 
11. n opt (8) is the number of optimal 2-schemata of defining length 8. Figure 1 is withp c = 
and Figure 2 with p c = 1. We show averages over 30 different runs. Without crossover there 
is essentially no preference for schemata of a given length. Adding in crossover leads to a 
remarkable change: Schemata prevalence is ordered monotonically with respect to length 
but with the larger schemata being favoured. This is in complete accord with the theoretical 
prediction of the previous section based on considerations of the evolution equation. We 
emphasize that this is purely an effect of the crossover operator and shows clearly that 
schema reconstruction is more important than schema destruction. 
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Figure 1: Graph of M(l) versus t in the unitation model with p c = 0. 

This effect is measured nicely by the effective fitness function in Figures 3 and 4. What 
we in fact graph is F(l) = (/ e(f (/) — / ff(8))// etf (8). where / cff (i) is the effective fitness of 
optimal 2-schemata of size I and / off (8) is the analogous quantity for optimal 2-schemata of 
size 8. Note that the effective fitness of larger schemata is greater than that of shorter ones 
for the first 6 or so generations, in fact significantly so given that the fluctuations in Figure 
3 are of the order of 0.0005 whereas the relative effective fitness advantage of schemata 
of size 8 relative to size 2 is, from Figure 4, about 0.01, i.e. 20 times larger! As can be 
seen from (Q) a positive selection coefficient is associated with a schema that is growing in 
number relative to another. After 6 generations the curves in Figure 2 start to converge 
again which coincides with the effective fitness now being larger for the smaller schemata. 
Roughly speaking one can think of the effective fitness as being a measure of the gradient 
of the curves in Figures 1 and 2. If one repeats the experiments for schemata of order 3 or 
higher one will once again find a preference for long, "non-local" schemata; non-local in the 
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Figure 2: Graph of M(l) versus t in unitation model with p c = 1. 

sense that there is no preference whatsoever for the three bits to be found together. 

One might say that this is all fine and good but we have only shown what happens for non- 
epistatic landscapes. An important aspect of the counting ones landscape is not so much 
that it is non-epistatic as rather it is neutral in the absence of crossover, in that there is no 
preference in the landscape itself for schemata of a certain length. This means we can study 
directly the geometrical effect of crossover without having to worry about the complicating 
effects of selection. Having established generic properties of crossover we can then turn 
to various classes of fitness landscape to investigate the intricate relationship between the 
two operators. We will first introduce epistasis by considering what happens in the case of 
landscapes of the form 

f(ci) = J2^ + w £ # (14) 

3 iKC, 

where the first sum is over all the Is of the string d and the second is over all pairs of Is, 
Ijk being the defining length of the schema with Is at the points j and k. — Yljk 
is a normalization constant, being a sum over all optimal 2-schemata of the form 11 of 
the optimum string 11111111. e simply controls the size of the length-dependent epistatic 
term relative to the counting ones term. Summing over the lengths of the schemata gives 
a landscape where there is a preference for long schemata. On the contrary, summing over 
the inverses of the lengths gives a landscape where there is a bias for short schemata. In the 
former case the epistasis can be thought of as giving rise to an effective repulsion between 
pairs of Is and in the latter an effective attraction. In both cases the epistasis between 
string bits depends on their distance apart. 

The results can be seen in Figures 5-8. In Figure 5 we see the effect of a bias for large 
schemata of magnitude e = 0.3 with p c = 0. The graph is quite similar to that of Figure 2 
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Figure 3: Graph of F(l) versus t for unitation model with p c = 0. 

hence one can see that crossover leads to an effective bias for larger schemata that is similar 
in many respects to an effective repulsion between schema bits. In Figure 6 we see what 
happens when one includes a bias for small schemata of magnitude e = 0.75 with p c = 1. 
Note that the effect of crossover is to completely annul the effect of the landscape bias. In 
Figures 7 and 8 we see the evolution of F(l) in both cases. In Figures 9 and 10 we see what 
happens in a deceptive landscape. The landscape we chose was one where for each of the 
28 different pairs of the 8-bit string we have two possible sets of conditions on the fitness of 
each pair: i) /(ll) = 3, /(01) = / (10) = 1, /(00) = 2; and ii) /(ll) = 3, /(01) = /(10) = 2, 
/(00) = 1. The first set clearly is deceptive, the second clearly not. As a function of the 
total number of deceptive pairs, rid, we can vary how deceptive the total landscape is. For 
rid = 28 the landscape is totally deceptive, whilst for rid = there is no deception. In Figure 
10 we see the effects of crossover on a totally deceptive landscape. Note that the effect of 
crossover in this case is to increase the bias for short schemata due to the fact that schema 
destruction is more important than schema reconstruction. 

5 Conclusions 

In this paper we have analyzed the consequences of an exact evolution equation for GAs 
which applies, in a very natural way, directly to schemata thus allowing for a critical analysis 
of the Schema theorem and the Building Block hypothesis. We saw that the very structure 
of the equation, taking into account as it does schema reconstruction, contains a general 
form of the building block hypothesis; longer, higher order schemata being constructed 
from smaller, lower order schemata when schema reconstruction dominates. The ultimate 
building blocks were shown to be 1-schemata as they are immune to the effects of crossover. 

We noted that a building block interpretation was natural in the case where schema recon- 
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Figure 4: Graph of F(l) versus t for unitation model with p c = 1. 

struction dominates, irrespective of whether the blocks were fit or not. In order to investigate 
under what conditions fit, small schemata were combined into fit, large schemata we found it 
useful to critically examine the concept of fitness. We used explicit examples to demonstrate 
that selective fitness and the corresponding fitness landscape were inadequate to intuitively 
understand GA evolution. We therefore introduced the notion of effective fitness, showing 
that it was a more relevant concept than pure selective fitness in governing the reproductive 
success of a schema. Based on this concept and the evolution equation we introduced a new 
schema theorem that showed that schemata of high effective fitness receive an exponentially 
increasing number of trials as a function of time. We also showed that generically there is 
no preference for short, low-order schemata. In fact, if schema reconstruction dominates 
the opposite is true. Only in deceptive problems does it seem that short schemata will be 
favoured, and then only in totally deceptive problems, as the system will tend to seek out 
the non-deceptive channels if they exist. 

We performed various experiments to verify our theoretical results in both epistatic and 
non-epistatic landscapes. For non-epistatic landscapes we confirmed that there is indeed a 
preference for large schemata. In fact we showed that schema prevalence is a monotonically 
increasing function of schema defining length. In a class of epistatic landscapes designed 
to give an effective repulsion or attraction between pairs of bits we showed that crossover 
in its action was analogous to a bit-bit repulsion, thus favouring long schemata. For a 
model deceptive landscape we showed that in the case of total deception contrary to all the 
previous cases, and as predicted on theoretical grounds, short schemata were favoured. 

It would naturally be very interesting to see if other exact results besides Geiringer's the- 
orem follow very simply from the evolution equation. A pressing matter is the search for 
approximation schemes within which the equations can be solved, as for a general landscape 
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Figure 5: Graph of M(l) versus t for biased model with bit-bit repulsion with e = 0.3, 
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Figure 6: Graph of M(l) versus t for biased model with bit-bit attraction with e = 0.75, 
Pc = 1. 
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Figure 7: Graph of F(l) versus t for biased model of Figure 5. 




Figure 8: Graph of F(l) versus t for biased model of Figure 6. 




ure 10: Graph of M(l) versus t in the fully deceptive model with p c = 



an exact solution will be impossible. In this respect techniques familiar from statistical 
mechanics, such as the renormalization group might well prove very useful. Of course, much 
more experimental analysis is needed on a wider set of test landsacpes. In particular it will 
be of interest to test the hypothesis that GAs will seek out non-deceptive trajectories if 
possible. 

Acknowledgements 

This work was partially supported through DGAPA-UNAM grant number IN105197. CRS 
is grateful to an anonymous referee for bringing the work of Lee Altenberg to his attention 
and to Adam Prughel-Bennctt for useful comments on the manuscript. 



References 

[1] Altenberg, L. (1995) "The Schema Theorem and Price's Theorem", Foundations of 
Genetic Algorithms 3, ed.s D. Whitley and M. Vosc, 23-49 (Morgan Kaufmann, San 
Mateo). 

[2] Bridges, C.L. and Goldberg, D.E. (1987) "An Analysis of Reproduction and Crossover 
in a Binary-encoded Genetic Algorithm", Genetic Algorithms and their Applications, 
ed. John J. Grcfenstette, 9-14 (Lawrence Erbaum Publishers, Hillsdale NJ) 9-14. 

[3] Eigen, M. (1971) Naturwissenschaften 58, 465. 

[4] Forrest, S. and Mitchell, M. (1993) "Relative Building Block Fitness and the Building 
Block Hypothesis", Foundations of Genetic Algorithms 2, ed. D. Whitley, 109-126 
(Morgan Kaufmann, San Mateo). 

[5] Geiringer, H. (1944) "On the Probability Theory of Linkage in Mendelian Heredity", 
Annals of Mathematical Statistics 15, 25. 

[6] Goldberg, D.E. (1987) "Simple Genetic Algorithms and the Minimal Deceptive Prob- 
lem", Genetic Algorithms and Simulated Annealing, ed. L. Davis, 74-88 (Pitman, Lon- 
don). 

[7] Goldberg, D.E. (1989) Genetic Algorithms in search, optimization and machine learn- 
ing, (Addison Wesley, Reading, MA). 

[8] Holland, J.H. (1975) Adaptation in natural and artificial systems, (MIT Press, Cam- 
bridge, MA). 

[9] Karlin, S. and Liberman, U. (1979) "Central Equilibria in Multilocus Systems I: Gen- 
eralized Nonepistatic Selection Regimes", Genetics 91, 777-798. 

[10] Leuthcusscr, I. (1986) J. Chem. Phys. 84, 1884. 

[11] Stephens, C.R. and Waelbroeck,H. (1997) "Effective Degrees of Freedom in Genetic 
Algorithms and the Block Hypothesis" , Proceedings of the Seventh International Con- 
ference on Genetic Algorithms, ed. T. Back, 34 (Morgan Kaufmann, San Mateo). 



[12] Stephens, C.R. and Waelbroeck, H. (1998) "Analysis of the Effective Degrees of Freedom 
in Genetic Algorithms", Phys. Rev. D57 3251. 

[13] Stephens, C.R. and Waelbroeck, H. (1998) "Schemata Evolution and Building Blocks", 
Evol. Comp. to be published. 

[14] Vosc, M.D. and Licpins, G. (1991) "Punctuated Equilibria in Genetic Search", Complex 
Systems 5, 31. 

[15] A. Nix and M.D. Vose, Annals of Mathematics and Artificial Intelligence 5, 79 (1992). 

[16] Juliany, J. and Vose, M.D. (1994) "The Genetic Algorithm Fractal" , Evolutionary Com- 
putation 2(2), 165-180. 

[17] Whitley, D. (1992) "Deception, Dominance and Implicit Parallelism in Genetic Search", 
Annals of Mathematics and Artificial Intelligence 5, 49-78. 



